Pointwise Prediction and Sequence-Based Reranking for Adaptable Part-of-Speech Tagging
نویسندگان
چکیده
This paper proposes an accurate method for partof-speech (POS) tagging that is highly domain-adaptable. The method is based on an assumption that the POS transition tendencies do not depend on domains, and has the following three characteristics: 1) it is trainable from partially annotated data, 2) it uses efficiently trainable pointwise POS taggers to allow for active learning, and 3) is more accurate than the pointwise or sequence-based POS taggers. The proposed method estimates POS tags by stacking pointwise and sequence-based predictors. In the experiments we deal with the joint problem of word segmentation and POS tagging in Japanese. We show that our proposed stacking process improves over pointwise and sequence-based methods (hidden Markov models and conditional random fields) both in the general domain and the target domain. In addition we show the learning curve in a domain adaptation scenario. The result shows that our method outperforms state-of-the-art methods in the same domain as the training data and is better than them in domain adaptation situations as well. Keywords-Active learning; Reranking; Word segmentation; Part-of-speech tagging; Pointwise prediction;
منابع مشابه
Low-Dimensional Discriminative Reranking
The accuracy of many natural language processing tasks can be improved by a reranking step, which involves selecting a single output from a list of candidate outputs generated by a baseline system. We propose a novel family of reranking algorithms based on learning separate low-dimensional embeddings of the task’s input and output spaces. This embedding is learned in such a way that prediction ...
متن کاملMandarin Part-of-Speech Tagging and Discriminative Reranking
We present in this paper methods to improve HMM-based part-of-speech (POS) tagging of Mandarin. We model the emission probability of an unknown word using all the characters in the word, and enrich the standard left-to-right trigram estimation of word emission probabilities with a right-to-left prediction of the word by making use of the current and next tags. In addition, we utilize the RankBo...
متن کاملWord Lattice Reranking for Chinese Word Segmentation and Part-of-Speech Tagging
In this paper, we describe a new reranking strategy named word lattice reranking, for the task of joint Chinese word segmentation and part-of-speech (POS) tagging. As a derivation of the forest reranking for parsing (Huang, 2008), this strategy reranks on the pruned word lattice, which potentially contains much more candidates while using less storage, compared with the traditional n-best list ...
متن کاملUsing Part-of-Speech Reranking to Improve Chinese Word Segmentation
Chinese word segmentation and Part-ofSpeech (POS) tagging have been commonly considered as two separated tasks. In this paper, we present a system that performs Chinese word segmentation and POS tagging simultaneously. We train a segmenter and a tagger model separately based on linear-chain Conditional Random Fields (CRF), using lexical, morphological and semantic features. We propose an approx...
متن کاملبرچسبگذاری ادات سخن زبان فارسی با استفاده از مدل شبکۀ فازی
Part of speech tagging (POS tagging) is an ongoing research in natural language processing (NLP) applications. The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The purpose of POS tagging is determining the grammatical ...
متن کامل